ji 1
Limiting fluctuation and trajectorial stability of multilayer neural networks with mean field training
The mean field theory of multilayer neural networks centers around a particular infinite-width scaling, in which the learning dynamics is shown to be closely tracked by the mean field limit. A random fluctuation around this infinite-width limit is expected from a large-width expansion to the next order. This fluctuation has been studied only in the case of shallow networks, where previous works employ heavily technical notions or additional formulation ideas amenable only to that case. Treatment of the multilayer case has been missing, with the chief difficulty in finding a formulation that must capture the stochastic dependency across not only time but also depth. In this work, we initiate the study of the fluctuation in the case of multilayer networks, at any network depth.
Misspecifying non-compensatory as compensatory IRT: analysis of estimated skills and variance
Tamano, Hiroshi, Hino, Hideitsu, Mochihashi, Daichi
Multidimensional item response theory is a statistical test theory used to estimate the latent skills of learners and the difficulty levels of problems based on test results. Both compensatory and non-compensatory models have been proposed in the literature. Previous studies have revealed the substantial underestimation of higher skills when the non-compensatory model is misspecified as the compensatory model. However, the underlying mechanism behind this phenomenon has not been fully elucidated. It remains unclear whether overestimation also occurs and whether issues arise regarding the variance of the estimated parameters. In this paper, we aim to provide a comprehensive understanding of both underestimation and overestimation through a theoretical approach. In addition to the previously identified underestimation of the skills, we newly discover that the overestimation of skills occurs around the origin. Furthermore, we investigate the extent to which the asymptotic variance of the estimated parameters differs when considering model misspecification compared to when it is not taken into account.